Evidence worksheets

Evidence Worksheet 1 for “Prokaryotes: The Unseen Majority”

Learning Objectives

Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.

General Questions

What were the main questions being asked?

How many microbes are there? Where are these microbes habituated? How much carbon to they amass too?

What were the primary methodological approaches used?

  • Extrapolation of microbial counts in “representative” samples of specific habitats (from other studies/research… some are unpublished?)

    • from this, they use estimation of total # of microbes to arithmetically estimate the amount of carbon, also using a representative estimated weight/cell and carbon/cell
  • original methodology for countin microbes assumed to be direct microscopic counts, measures of turbidity orciable counts

    • nextgen sequencing not a thing yet when this paper was published

Summarize the main results or findings.

  • estimated 4-6*10^6 prokaryotic cells contributing to 350-550Pg of C - HUGE portion of Earth’s carbon

  • distribution of microbes: ** subsurface >> soil > aquatic >>>>>>> air, animal, other habitats

  • huge store of nutrients - 10 fold more P and N than plants

Do new questions arise from the results?

  • How have these numbers change if the study were repeated today? What about with new, different methods, ie. nextgen sequencing? Are the results still the same?

Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

  • Where did these numbers come from? What studies did they get these numbers from? How did they get data from unpublished studies that are never referenced?

  • If they explained their numbers/calculations/assumptions, that would have been a lot more helpful, rather than the reader guessing/trying to figure it out themselves

  • Figures/tables were useful, especially for summarizing numbers across habitats and drawing comparisons

Evidence Worksheet_02 Life and the Evolution of Earth’s Atmosphere

Learning Objectives

Comment on the emergence of microbial life and the evolution of Earth systems

General Questions

  • Indicate the key events in the evolution of Earth systems at each approximate moment in the time series. If times need to be adjusted or added to the timeline to fully account for the development of Earth systems, please do so.

    • 4.6 billion years ago
      • solar system begins with one or more local supernova explosions
      • 4.5Ga possible collision leading to moon and seasons/tides
    • 4.2 billion years ago
      • point at which earth could have permanent habituation (or even 4.0ga or less)
    • 3.8 billion years ago
      • until now, earth is constantly bombareded so that all H20 is vapourized or temperature is >110 degrees C
    • 3.75 billion years ago
      • evidence from Isua belt, oldest rocks, evidence of oceans and banded conformations indicating anoxygenic photosynthesis which implies considerable speciation - oldest probable life!
      • however dating is controversial and carbonate is not necssarily the age of the host rock
    • 3.5 billion years ago
      • rubisco signature implies global oxygenic photosynthesis and evolution of cyanobacteria
      • confirmed life exists for at least this long
    • 3.0 billion years ago
      • geologic evidence for archaea, stromatolites
    • 2.7 billion years ago
      • well developed stromatolites found in many places
      • ancestral eukaryotes appear to predate 2.7Ga
      • hydrocarbon biomarkers found, including 2-alpha-methylhopanes are characteristic of cyanobacteria, implying oxygenic photosynthesis is occuring
    • 2.2 billion years ago
      • redbeds do not exist before this point - suggests oxidizing conditions after this time
      • sharp spike in oxygen level, and possibly the appearance of complex eukaryotes
    • 2.1 billion years ago
      • switch from a microaerobic early atmosphere to a oxis air at this point
    • 1.3 billion years ago
      • multicellular life, first multicellular organisms!
    • 550,000 years ago
      • cambrian explosion!! life expansion pack!
    • 200,000 years ago
      • plants and animal on land
  • Describe the dominant physical and chemical characteristics of Earth systems at the following waypoints:

    • Hadean
      • hot, external CO2 and vapourized H2O
      • cold glacial temperatures when not bombarded (due to young sun)
      • water hydrolyzes to H2 and O2 and H2 escapes
    • Archean
      • rich in phosphate
      • warmth and UV protection
      • subaerial and subaqueous hydrothermal vents rich in metals
      • sulphur cycle (modern or not?)
    • Precambrian
      • same as above
    • Proterozoic
      • oxygen rapidly increasin, therefore [CH4] in atmosphere decreasing - get glaciation, becomes very cold
      • CO2 also likely decreasing due to photosynthetic microbes
    • Phanerozoic
      • ~22% O2 due to photosynthesis (plants + bacteria)
      • relatively less CO2 than before on a large time scale3
      • N aroudn the same through this point

Evidence Worksheet 3 for “The Anthropocene”

Learning Objectives

Evaluate human impacts on the ecology and biogeochemistry of Earth systems.

General Questions

What were the main questions being asked?

  • have humans changed the Earth to the extent that geological deposits have a distinct signature from the previous epochs? If so, when was this stratigraphic signal recognizable worldwide?

What were the primary methodological approaches used?

  • by reviewing other primery literature, 1. define reasonable stratigraphic signals to use as markers to recognize the Anthropocene (and making them consistent with those used for earlier periods)
  • Describe signals and use them to place the anthropocene in earth’s timeline

Summarize the main results or findings.

  • key markers of anthropogenic change have increased, supporting an Anthropocene epoch - this includes concrete, plastics, global black carbon, plutonium, nitrates, CO2, CH4, temperature and extinction :(
  • new anthropogenic materials have persistent and widespread remains, previously unseen at such high levels - due to manufacturing (plastics, aluminum, concrete), and from fossil fuel burning (black carbon, inorganic ash spheres, spherical carbonaceious particles), products of mining, waste disposal, construction and urbanization
    • both increased greatly after 1950s - great acceleration event
  • sedimentary proccesses modified due to more N and P inventories, which leaves distinct rock layering in lakes and ice nitrate levels
  • geochemical signatures, including polyaromatic HCs, polychlorinated BPs, pesticide residues, 207/206Pb ratios, starting in 1945-1950
  • radiogenic signatures and radionuclides in sediments in ice due to nuclear weapons testing, resulting in excess 13C
  • carbon cycle perturbations
  • sea level and temperature rapidly increasing
  • all of the above points are unique from the Holocene, or unexplainable variance in Holocene’s history. This means that all are due to human acticity, and will continue to increase, therefore Anthropocene is well supported, and probably can be considered to start in the mid 1900s

Do new questions arise from the results?

  • What comes after the Anthropocene? Have we brought upon our own demise? If not, are some of these changes reversable? Do we want to reverse these changes? Or do we give up on going back to how things were, and try to adapt to changes instead - spaceship Earth?

Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

  • figures were very useful, allowed condensed view for evidence to claims
  • some jargon used, especially in quaternary stratigraphic units section
  • section headers were used amazingly, and allowed quick identification/navigation of essential evidence/points

Evidence worksheet_04 “Bacterial Rhodopsin Gene Expression”"

Learning Objectives

Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.

  • Discuss the relationship between microbial community structure and metabolic diversity
  • Evaluate common methods for studying the diversity of microbial communities
  • Recognize basic design elements in metagenomic workflows

General Questions

What were the main questions being asked?

  • what is the physiological basis of light activated grwoth stimulation in photorhodopsin (PR) containin microbes?
    • what gene(s) encode a PR-based photosystem
    • what are the full functions and roles of the many microbial marine PRs?
      • role of light in this? Is an H+ gradient required for ATP synthesis?
  • can they recover a function PR system using functional metagenomic screening?
  • can a single genetic event enable acquisition of phototropic capabilities

What were the primary methodological approaches used?

  • functional screening on marine picoplankton large-insert genomic library (fosmid)
  • sequencing transposon-insertion clones

Summarize the main results or findings.

  • genomic analysis of PR photosystem expressing clones:
    • both clones derived from alphaprotobacteria
    • PR genes encode a protein characteristics of blue-light absorbing PRs and orange pigmentation
    • clones also carry predicted six-gene operon, encoding putative retinal/B-carotene biosynthesis genes
  • genetic and photypic analysis of PR photosystem
    • PR gene disruption results in no orange pigmentation and questionably low retinol
  • single genetic event can result in acquisition of phototrophic capabilities, which explains ubiquity of PR photosystems amonst microbial taxa

Do new questions arise from the results?

  • Although in this case the proteins/metabolites are not toxic, the expression system they used in this study to conduct functional screening would not be effective if the proteins/metabolites are toxic. Thus, are cell-free expression systems the future of functional screening?

Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

  • Good background given, because without it I wouldn’t have been able to understand this paper

Evidence Worksheet 5 for “Extensive mosaic structure”

Learning Objectives

  • Evaluate the concept of microbial species based on environmental surveys and cultivation studies.

  • Explain the relationship between microdiversity, genomic diversity and metabolic potential

  • Comment on the forces mediating divergence and cohesion in natural microbial communities

General Questions

What were the main questions being asked?

  • What is the complete genome of uropathogenic straing CFT073
  • what are the differences between strains of e. coli? How are the strains different, and are these differences significant? Is this difference neutral, or relevant to pathogenicity?
  • how much difference is there between the strains?

What were the primary methodological approaches used?

  • 3 strains of e. coli obtained, CFT073 (uropathogenic), EDL933 (enterohemorrhagic), and MG1655 (laboratory string)
  • isolate and culture CFT073, create shotgun library with PCR and primer walking (old technique)
  • assemple, annotate and compare the seqs of the 3 strains (other 2 strains already have sequences)
    • predict proteins compared to NCBI database and deduce evolutionary basis of pathogenecity

Summarize the main results or findings.

  • sequence for uropathogenic strain determined and basic information of uropathogenic strain on genome organization collected
  • identified distinct codon usage in island genes suggesting acquisition by HGT
  • uropathogenic strain CFT073 has specific islands, ~ 10% genes shared with EDL933
    • 2/3 have unknown functions or may be phage/insertion element associated
    • 1/3 are encoding iron update, fatty acid biosynthesis genes, adhesins, phosphetransferases, ABC type transporter
      • adhesins!! important shared role in pathogenicity?
  • difference in diseases potential between uropathonic and enterohemorrhagic strins determined
    • enterohemmorrhagic strin has a type III secretion system and phage/plasmid encoded virulence genes
    • uropathogenic strain has specific adhesins, secreted autotransporteds and phase switch recombinases
  • only 39.2% similarity between the 3 strains of the same “species”!! COre metabolic pathways and housekeeping is similar tho, and the reamining 60% of diversity is from HGT

Do new questions arise from the results?

  • Is our definition of “species” relevant when it comes to microbes? As seen here species encompasses all of the pathogenic and non pathogenic strains of ecoli… is this definition too broad?
    • can it be improved? Or should we just do away with it
  • Alternatively, do we keep the species, and just keep talking about them in strains… but then what is the purpose of have species classifier?

Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

  • heavy with jargon at times
  • expansion of implications of results on species concepts would be better, since this is very interesting and is only briefly mentioned

Problem sets

Problem set 01

Learning objectives:

Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.

Specific Questions:

  • What are the primary prokaryotic habitats on Earth and how do they vary with respect to their capacity to support life? Provide a breakdown of total cell abundance for each primary habitat from the tables provided in the text.
    • soil, seawater, & sediment/soil subsurface
    • in short: capacity for life: subsurface >>> aquatic > soil
      • subsurface (>10cm marins, >8m terrestrial): total = 3.8*10^30 cells
      • aquatic: 1.18*10^29 cells
        • freshwater: 1.31*10^26 cells
        • saline lakes: 1.0*10^26 cells
        • marine: 1.18*10^29 cells
      • soil: 2.56*10^29 cells
  • What is the estimated prokaryotic cell abundance in the upper 200 m of the ocean and what fraction of this biomass is represented by marine cyanobacterium including Prochlorococcus? What is the significance of this ratio with respect to carbon cycling in the ocean and the atmospheric composition of the Earth?
    • in the upper 200m of the ocean, estimated to be 36010^26 cells, from average density of 510^5 cells/mL
    • marine cyanobacterium density = 4*10^4 cells/mL
    • the ratio of these two numbers of marine cyanobacterium to estimated prokaryotic cell abundance in the upper 200m is 0.08, or 8%. ie. 8*% of all prokaryotic ce4lls in the upper 200m of the ocean are marine cyanobacterium
    • 8% is a considerable amount, meaning that a significan portion of the upper oceanic microbes that photosynthesize - fixing carbon, and likely make up a large portion of oceanic C fixation, which convert CO2 from atmosphere to organic carbon, which can also be used to support other microbes/life
      • from this carbon, also contributes to the high turnover rates seen
  • What is the difference between an autotroph, heterotroph, and a lithotroph based on information provided in the text?

    • autotroph: provides own organic molecules by fixing CO2
    • heterotroph: need to get organic molecules from environment
    • lithotroph: do not need organic molecules; can use inorganic sources for energy
  • Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?

    • according to text, ~4km, since temperature reaches 125 degrees, which is close to upper temp. for prokaryotic life
      • primary limiting factor for life is temp
    • however, mariannas trench is 11km deep - could be life here?
  • Based on information provided in the text your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?
    • 57-77km - highest altitude range where microbes found
    • primary limiting factor is carbon, becomes more sparse as altitude increases
  • Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?
    • given range of -4km(according to text) to 57-77km (highest altitude range according to text)
    • from this, this would give us a vertical distance of 61-88km
  • How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)
    • divide population by turnover time and convert to years eg. for marine autotrophs:
((2.9*10^27)/1.5)*365
## [1] 7.056667e+29
  • What is the relationship between carbon content, carbon assimilation efficiency and turnover rates in the upper 200m of the ocean? Why does this vary with depth in the ocean and between terrestrial and marine habitats?
    • higher C assimilation –> higher turnover
      • this is more carbon available due to assimilation, means more carbon available for growth
    • with depth in the ocean
      • increasing depth usually means less carbon assimilation, less carbon available, therefore lower turnover rates (ie. energy through photosynthesis)
      • nutrient availability
      • also temperature can affect metabolism
    • between terrestrial and marine habitats - easier to get nutrients in marine habitat since its in solution, movement of nutrients and more microbes = more nutrients which supports more microbes…
  • How were the frequency numbers for four simultaneous mutations in shared genes determined for marine heterotrophs and marine autotrophs given an average mutation rate of 4 x 10-7 per DNA replication? (Provide an example of the calculation with units. Hint: cell and generation cancel out)
    • (4e-7)^4 * 3.6e28cells/16days * 1day/24hours = 2.4/hour of 4 simultaneous mutations in shared genes
  • Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?
    • HUGE genetic diversity & adaptive potential
    • besides point mutations, there are also indels, inversions, duplications and transpositions for mutational changes
    • on top of this, they also have horizontal gene transfer (HGT) through conjugation or viral infection
  • What relationships can be inferred between prokaryotic abundance, diversity, and metabolic potential based on the information provided in the text?
    • positive correlation between abundance, diversity, and metabolic potential

Problem set_02 Microbial Engines

Learning objectives:

Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.

Specific Questions:

  • What are the primary geophysical and biogeochemical processes that create and sustain conditions for life on Earth? How do abiotic versus biotic processes vary with respect to matter and energy transformation and how are they interconnected?
  1. geophysical (mainly acid/base rxns)
    • tectonics resupply C, S, P
    • geothermal processes resuply C, S and metals
  2. biochemical (biologically driven)
    • mainly redox reactions
    • photosynthesis drives oxidation of electron donors (requires energy in the form of light rather than through bond energy)
    • respiration drives reduction of electron donors
    • forms a cycle, although not perfect, resupplemented by geophysical processes
  • Why is Earth’s redox state considered an emergent property?
    • originally earth’s chemistry was mainly acid/base and not redox, due to abiotic processes, but with the emergence and addition of microbes, this lead to increase in a redox state world-wide, since microbes increase both oxidized and reduced forms of metabolites
  • How do reversible electron transfer reactions give rise to element and nutrient cycles at different ecological scales? What strategies do microbes use to overcome thermodynamic barriers to reversible electron flow?
    • reversibility allows both oxidation and reduction
    • eg. inorganic C + energy <-> organic C
      • therefore, in areas or niches where inorganic C and energy is available, this can be reduced into organic C form through photosynthesis
      • In areas where organic C is more abundant, this can be oxidized into energy and inorganic C
    • on a small scale, there is a cycle between photosynthesis and respiration
    • on a large scale, elements and nutrients are cycled into different forms
    • thermodynamic barriers are overcome by co-operation or distributed metabolic networks, and using light electrons as reducing power
  • Using information provided in the text, describe how the nitrogen cycle partitions between different redox “niches” and microbial groups. Is there a relationship between the nitrogen cycle and climate change?
    • N2 -> NH4+ (fixation)
      • group of bacteria that reduces/fixes N2 to NH4+, uses nitrogenase
    • NH4+ -> NO2 (nitrification, also fixes CO2 to CH2O)
      • group of bacteria/archaea that oxidizes NH4+ to NO2, and consumes oxygen in the process
    • NO2 -> NO3 (nitrification)
      • group of bacteria that oxidizes NO2 to NO3, requires oxygen as well, not in competition with previous group due to different electron donor
    • No3 -> N2 (denitrification)
      • reduces NO3 to N2 again through anaerobic oxidation of organic matter
    • in regards to climate change, the burning of fossil fuels, increase of nitrogen fertilizers, etc, have increased the amoun of N in the cycle which leads to an increased accumulation of NH4+, which means increased acidity, and leads to NO3 leaching so that this results in dead fish, and animals/insects also die
  • What is the relationship between microbial diversity and metabolic diversity and how does this relate to the discovery of new protein families from microbial community genomes?
    • metabolic diversity would lead to microbial diversity because different types of metabolism would lead to different niches being occupied, and thus difference microbes would arise as a result
    • there is some machinery that is highly conservered amonst all groups, which includes basic electron transductin processes, protein translation, DNA replication, etc.
    • there are also different genes that can lead to increased microbial and metabolic diversity, such as carryon genes and boutique genes (specific to niche)
  • On what basis do the authors consider microbes the guardians of metabolism?
    • they are ubiquitous and holy the intructions and capacity for metabolism
    • and in the event of extinction withou complete annihilation they will maintain and guard all the genes and mechanisms necessary for metabolism, and eventually re-establish worldwide
    • HGT can also help maintain pathways

Problem set_03 Metagenomics: Genomic Analysis of Microbial Communities

Learning objectives:

Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.
(Reminders for how to format links, etc in RMarkdown are in the RMarkdown Cheat Sheets)

Specific Questions:

  • How many prokaryotic divisions have been described and how many have no cultured representatives (microbial dark matter)?

    • divisions are based off of 16S rRNA genes into operational taxonomic units [@ward]
    • 89 bacterial and 20 archaeal phyla are recognized by small subunit ribosomal RNA databases, but this could be much higher, up to 1500 bacterial
      • many of these have no culture representatives [@solden]
  • How many metagenome sequencing projects are currently available in the public domain and what types of environments are they sourced from?
    • EBI metagenomics: 1555 projects
    • joint genome institute: 10378
    • there appears to be many different websites all hosting their own projects, only included ones that were found in top hits on google. There doesn’t seem to be one site that keeps track of all projects, and it is likely that some of these projects are duplicated between different sites
    • there are many different environments that the metagenome sequencing projects are sourced from, as submissions for these sites seem to be open, and any sort of environment may be sourced from. For example, from the joint genome institute, there are engineered environments (eg. bioreactor, solid waste, wastewater), environmental environments (eg, air, aquatic, terrestrial), and host-associated environments (eg. human, insecta)
  • What types of on-line resources are available for warehousing and/or analyzing environmental sequence information (provide names, URLS and applications)?
  • What is the difference between phylogenetic and functional gene anchors and how can they be used in metagenome analysis?
    • phylogenetic anchors are transferred vertically while functional gene anchors can be tranferred horizontally (through horizontal gene transfer). Additionally, phylogenetic anchors are usually single copy, and carry phylogenetic information which makes it possible for tree reconstruction. This is in contrast to functional anchors which are not useful for phylogeny (due to horizontal transfer) and are more useful for identifying functions.
    • in metagenome analysis, depending on the type of analysis being conducted and the kind of question being asked, different anchors may be used. For example, for constructin a phylogeny, considering “who” is there, diversity, and maybe even predicting what evolution will result in, phylogenetic anchors are more useful. In a different analysis, where the question asked is more concerned with the function of a community of microbes, it is more useful to use functional gene anchors.
  • What is metagenomic sequence binning? What types of algorithmic approaches are used to produce sequence bins? What are some risks and opportunities associated with using sequence bins for metabolic reconstruction of uncultivated microorganisms?
    • metagenome sequence binning: grouping contigs/scaffolds/genes into OTUs.
    • two approaches for sequence binning:
      1. composition based
        • used shared features such as GC content, or markov models based on kmer frequences
        • this doesn’t need ref seqs so works well for unknown microbes
        • however, it may be weak to misclasification errors
      2. similarity based (comparing to reference sequences from known OTUs)
        • this is rather simple approach
        • however, weak to the fact that undiscovered microbes, or microbes with no reference sequence may be missed
    • some risks: binning doesn’t always resolve strains, may be an inaccurate view of diversity, and also result in missing less abundant or smaller sequences
    • some opportunities: potential to reconstruct genetic and metabolic diversity accurately - draw biological conclusions from sequence data
  • Is there an alternative to metagenomic shotgun sequencing that can be used to access the metabolic potential of uncultivated microorganisms? What are some risks and opportunities associated with this alternative?

    • single cell sequencing (inspired from co-op experience)
      • this could result in identification of rare microbes that carry important boutique genes
      • however, this technique is weak to limited DNA, and technical difficulties in analysis of the huge amount of data that this would produce

Problem set_04 Fine-scale phylogenetic architecture

Learning objectives:

  • Gain experience estimating diversity within a hypothetical microbial community

Outline:

In class Day 1:

  1. Define and describe species within your group’s “microbial” community.
  2. Count and record individuals within your defined species groups.
  3. Remix all species together to reform the original community.
  4. Each person in your group takes a random sample of the community (i.e. devide up the candy).

Assignment:

  1. Individually, complete a collection curve for your sample.
  2. Calculate alpha-diversity based on your original total community and your individual sample.

In class Day 2:

  1. Compare diversity between groups.

Part 1: Description and enumeration

Obtain a collection of “microbial” cells from “seawater”. The cells were concentrated from different depth intervals by a marine microbiologist travelling along the Line-P transect in the northeast subarctic Pacific Ocean off the coast of Vancouver Island British Columbia.

Sort out and identify different microbial “species” based on shared properties or traits. Record your data in this Rmarkdown using the example data as a guide.

Once you have defined your binning criteria, separate the cells using the sampling bags provided. These operational taxonomic units (OTUs) will be considered separate “species”. This problem set is based on content available at What is Biodiversity.

For example, load in the packages you will use.

#To make tables
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 3.4.4
library(knitr)
#To manipulate and plot data
library(tidyverse)

Then load in the data. You should use a similar format to record your community data.

example_data1 = data.frame(
  number = c(seq(1, 44)),
  name = c("vines", "yellow small brick", "green small brick", "blue small brick", "red small brick", "yellow brick large", "green brick large", "blue brick large", "red brick large", "purple skittle", "green skittle", "orange skittle", "red skittle", " yellow skittle", "green mikes", "light red mikes", "dark red mikes", "yellow mikes", "orange mikes", "clear bear", "green bear", "yellow  bear", "orange bear", "light red bear", "dark red bear", "blue M&M", "brown M&M", "yellow M&M", "orange M&M", "green M&M", "red M&M", "kiss candy", "sour bear", "sour fruit", "sour hexa", "sour bottle", "blue sour swirl", "red sour swirl", "green jube", " orange jube", "red jube", "yellow jube", "purple jube", "wine"),
  characteristics = c("red", "yellow", "green", "blue", "red",
                      "yellow", "green", "blue", "red", "purple",
                      "green", "orange", "red", "yellow", "green",
                      "light red", "dark red", "yellow", "orange", "clear",
                      "green", "yellow", "orange", "light red", "dark red",
                      "blue", "brown", "yellow", "orange", "green",
                      "red", "silver", rep("NA", 4), "blue", "red",
                      "green", "orange", "red", "yellow", "purple", "NA"),
  occurences = c(8, 0, 1, 1, 1, 0, 0, 0, 0, 13, 6, 19, 6, 5, 6, 11, 13, 3, 7, 5, 3, 3, 4, 2, 6, 10, 7, 3, 15, 6, 3, 2, 0, 0, 2, 0, 0, 1, 1, 0, 1, 0, 0, 1),
  #total is 175
  community_occurences = c(14,4,2,3,6,1,0,1,1,40,46,39,31,31,37,39,38,27,33,16,19,18,16,16,16,59,30,33,62,28,29,16,3,2,6,3,2,1,5,5,7,4,3,9)
)

Finally, use these data to create a table.

For your community:

  • Construct a table listing each species, its distinguishing characteristics, the name you have given it, and the number of occurrences of the species in the collection.
example_data1 %>% 
  kable("html") %>%
  kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
number name characteristics occurences community_occurences
1 vines red 8 14
2 yellow small brick yellow 0 4
3 green small brick green 1 2
4 blue small brick blue 1 3
5 red small brick red 1 6
6 yellow brick large yellow 0 1
7 green brick large green 0 0
8 blue brick large blue 0 1
9 red brick large red 0 1
10 purple skittle purple 13 40
11 green skittle green 6 46
12 orange skittle orange 19 39
13 red skittle red 6 31
14 yellow skittle yellow 5 31
15 green mikes green 6 37
16 light red mikes light red 11 39
17 dark red mikes dark red 13 38
18 yellow mikes yellow 3 27
19 orange mikes orange 7 33
20 clear bear clear 5 16
21 green bear green 3 19
22 yellow bear yellow 3 18
23 orange bear orange 4 16
24 light red bear light red 2 16
25 dark red bear dark red 6 16
26 blue M&M blue 10 59
27 brown M&M brown 7 30
28 yellow M&M yellow 3 33
29 orange M&M orange 15 62
30 green M&M green 6 28
31 red M&M red 3 29
32 kiss candy silver 2 16
33 sour bear NA 0 3
34 sour fruit NA 0 2
35 sour hexa NA 2 6
36 sour bottle NA 0 3
37 blue sour swirl blue 0 2
38 red sour swirl red 1 1
39 green jube green 1 5
40 orange jube orange 0 5
41 red jube red 1 7
42 yellow jube yellow 0 4
43 purple jube purple 0 3
44 wine NA 1 9
  • Ask yourself if your collection of microbial cells from seawater represents the actual diversity of microorganisms inhabiting waters along the Line-P transect. Were the majority of different species sampled or were many missed?

    • high abundance species were sampled, but many of the lower abundance ones were missed

Part 2: Collector’s curve

To help answer the questions raised in Part 1, you will conduct a simple but informative analysis that is a standard practice in biodiversity surveys. This analysis involves constructing a collector’s curve that plots the cumulative number of species observed along the y-axis and the cumulative number of individuals classified along the x-axis. This curve is an increasing function with a slope that will decrease as more individuals are classified and as fewer species remain to be identified. If sampling stops while the curve is still rapidly increasing then this indicates that sampling is incomplete and many species remain undetected. Alternatively, if the slope of the curve reaches zero (flattens out), sampling is likely more than adequate.

To construct the curve for your samples, choose a cell within the collection at random. This will be your first data point, such that X = 1 and Y = 1. Next, move consistently in any direction to a new cell and record whether it is different from the first. In this step X = 2, but Y may remain 1 or change to 2 if the individual represents a new species. Repeat this process until you have proceeded through all cells in your collection.

For example, we load in these data.

example_data2 = data.frame(
  x = c(seq(1,175)),
  y = c(1,1,2,2,3,3,4,5,6,7,8,rep(9,8),
        10,11,12,rep(13,13),rep(14,6),rep(15,19),rep(20,6),rep(21,5), rep(22, 6), rep(23, 11), rep(24, 13), rep(25, 3), rep(26, 7), rep(27, 5), rep(28, 3), rep(29, 3), rep(30, 4), rep(31, 2), rep(32, 6), rep(33, 10), rep(34, 7), rep(35, 3), rep(36, 15), rep(37,6))
)

And then create a plot. We will use a scatterplot (geom_point) to plot the raw data and then add a smoother to see the overall trend of the data.

ggplot(example_data2, aes(x=x, y=y)) +
  geom_point() +
  geom_smooth() +
  labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'

For your sample:

  • Create a collector’s curve for your sample (not the entire original community).
  • Does the curve flatten out? If so, after how many individual cells have been collected?
    • the curve does not flatten out
  • What can you conclude from the shape of your collector’s curve as to your depth of sampling?
    • from the observation that the curve does not flatten out, and seems to still have a fairly high slope, we can conclude that sampling is incomplete and many species remain undetected, and the depth of sampling was not sufficient. On comparison with the community data we can confirm this is true, as a large number of the lower abundance species were not sampled at all

Part 3: Diversity estimates (alpha diversity)

Using the table from Part 1, calculate species diversity using the following indices or metrics.

Diversity: Simpson Reciprocal Index

\(\frac{1}{D}\) where \(D = \sum p_i^2\)

\(p_i\) = the fractional abundance of the \(i^{th}\) species

For example, using the example data 1 with 3 species with 2, 4, and 1 individuals each, D =

species1 = 2/(2+4+1)
species2 = 4/(2+4+1)
species3 = 1/(2+4+1)

1 / (species1^2 + species2^2 + species3^2)
## [1] 2.333333

The higher the value is, the greater the diversity. The maximum value is the number of species in the sample, which occurs when all species contain an equal number of individuals. Because the index reflects the number of species present (richness) and the relative proportions of each species with a community (evenness), this metric is a diveristy metric. Consider that a community can have the same number of species (equal richness) but manifest a skewed distribution in the proportion of each species (unequal evenness), which would result in different diveristy values.

  • What is the Simpson Reciprocal Index for your sample?
total_indivs = 175
squared_sum = 0
for (occurence_value in example_data1$occurences){
  squared_sum = squared_sum + (occurence_value/total_indivs)^2
}
simpson_reciprocal_index = 1/squared_sum
simpson_reciprocal_index
## [1] 18.93939

my simpson reciprocal index is 18.93939

  • What is the Simpson Reciprocal Index for your original total community?
total_indivs = 801
squared_sum = 0
for (occurence_value in example_data1$community_occurences){
  squared_sum = squared_sum + (occurence_value/total_indivs)^2
}
simpson_comm_reciprocal_index = 1/squared_sum
simpson_comm_reciprocal_index
## [1] 23.98777

the reciprocal index for the original total community is 23.98777

Richness: Chao1 richness estimator

Another way to calculate diversity is to estimate the number of species that are present in a sample based on the empirical data to give an upper boundary of the richness of a sample. Here, we use the Chao1 richness estimator.

\(S_{chao1} = S_{obs} + \frac{a^2}{2b})\)

\(S_{obs}\) = total number of species observed a = species observed once b = species observed twice or more

So for our previous example community of 3 species with 2, 4, and 1 individuals each, \(S_{chao1}\) =

3 + 1^2/(2*2)
## [1] 3.25
  • What is the chao1 estimate for your sample?

    31 + (7^2)/(2*24)
    ## [1] 32.02083
    • the chao1 estimate is 32.02083
  • What is the chao1 estimate for your original total community?

43 + (4^2)/(2*39)
## [1] 43.20513
* the chao1 estimate is 43.20513

Part 4: Alpha-diversity functions in R

We’ve been doing the above calculations by hand, which is a very good exercise to aid in understanding the math behind these estimates. Not surprisingly, these same calculations can be done with R functions. Since we just have a species table, we will use the vegan package. You will need to install this package if you have not done so previously.

library(vegan)

First, we must remove the unnecesary data columns and transpose the data so that vegan reads it as a species table with species as columns and rows as samples (of which you only have 1).

example_data1_diversity = 
  example_data1 %>% 
  select(name, occurences) %>% 
  spread(name, occurences)
example_data1_comm_diversity = 
  example_data1 %>% 
  select(name, community_occurences) %>% 
  spread(name, community_occurences)
example_data1_comm_diversity

Then we can calculate the Simpson Reciprocal Index using the diversity function.

diversity(example_data1_diversity, index="invsimpson")
## [1] 18.93939
diversity(example_data1_comm_diversity, index="invsimpson")
## [1] 23.98777

And we can calculate the Chao1 richness estimator (and others by default) with the the specpool function for extrapolated species richness. This function rounds to the nearest whole number so the value will be slightly different that what you’ve calculated above.

specpool(example_data1_diversity)
specpool(example_data1_comm_diversity)

In Project 1, you will also see functions for calculating alpha-diversity in the phyloseq package since we will be working with data in that form.

For your sample:

  • What are the Simpson Reciprocal Indices for your sample and community using the R function?
    • using the R function, the simpson reciprical index is 18.93939 for the sample, and 23.98777 for the community
  • What are the chao1 estimates for your sample and community using the R function?
    • using the R function, the chao1 estimate for the sample is 32 and 43 for the community
    • Verify that these values match your previous calculations.
      • this is seen to match up with previous calculations

Part 5: Concluding activity

If you are stuck on some of these final questions, reading the Kunin et al. 2010 and Lundin et al. 2012 papers may provide helpful insights.

  • How does the measure of diversity depend on the definition of species in your samples?
    • depending of the definition of species within the samples, this could either inflate or deflate the corresponding diversity indices. For example, when defining a “species” as a particular type of candy and colour, this resulted in a higher diversity index than when classifying a “species” only based on type of candy alone
  • Can you think of alternative ways to cluster or bin your data that might change the observed number of species?
    • as mentioned previously, perhaps a different definition of species would only look at the type of candy it is, rather than using both the colour and type of candy to define a species. Alternatively, an even larger binning of the candy to define a species may only consider the colour of the candy alone.
  • How might different sequencing technologies influence observed diversity in a sample?
    • different sequencing techniques may lead to different sequencing errors that can artificially inflate the number of species “found” later downstream when processing the reads such as in pyrosequencing. Alternatively, it is also possible that other sequencing techniques would artificially deflate diversity, such as sanger sequencing, due to its ability to only sequence higher abundance species. Therefore it is entirely possible that different sequencing technologies alone may influence the observed diversity in a sample due to different sequencing depths and error rates.

Writing assignments

Writing assignment Module 1

Prompt: “Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformations it provides.” Do you agree or disagree with this statement? Answer the question using specific reference to your reading, discussions and content from evidence worksheets and problem sets.

Introduction

“Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformations it provides.” Historically, microbes have existed and fared well long before the existence of humans, with fossils dating back to 0.2 million years ago (Ma) [1], and probably existing before 3500 Ma [2]. However, for humans, I believe that we are unable to live without the global catalysis and environmental transformations that they provide, while microbial life is probably content to go on without us in whatever state we put the world in. In this paper, the microbes’ irreplaceable and essential role in supporting human life will be examined. This involves discussion of why microbes would be impossible to replace with human technology, both in quantity (abundance, catalysis on a global scale) and quality (complexity).

Quantity

Firstly, simply due to the sheer number and abundance of microbes, microbes would be extremely difficult or impossible to replace. While it is common knowledge that microbial life is everywhere, quantitative classification is still largely disregarded. Based on the work of Whitman, Coleman, and Wiebe in 1998, the number provided for prokaryotic cells on Earth was approximated to be 4-6 x 1030 cells, which is responsible for 350-550Pg of carbon [3]. This number is so massive, it is easy to trivialize, since we generally do not have a concept of this number. To put this in perspective, consider that if we were to count every grain of sand on earth in all deserts and beaches, which would be roughly equal 7.5 x 1018 grains of sand. This does not even come close to the number of prokaryotic cells on earth. However this might have been obvious, considering the fact that microbes are everywhere and also magnitudes smaller in size. More comparably, the number of stars in the universe has been approximated to be 7 x 1022. This still does not come close to the number of prokaryotic cells. Each one of these cells is constantly under natural selection pressures, evolving, and adapting to our ever changing climate, responsible for catalyzing reactions, providing energy as a primary producer, and acting as a huge storage of all of the information used to carry out all these roles and processes. Therefore, to begin considering a life without microbes, or replacing microbes, we would have to somehow be able to do the equivalent work of 4-6 x 1030 cells, a number that we cannot even realistically conceptualize.

Even if we were to put the microbes’ role as a large primary producer aside, that still leaves the role microbes play in the cycling of elements on a global scale. In particular, Nitrogen, which is essential for synthesis of nucleic acids and proteins, and Carbon, an essential element for all organic life. In the name of monetary profit, and convenience, humans have made a huge impact on the Nitrogen cycle in the 20th century. This includes the reduction of the inert form of Nitrogen, N2 to NH4+, the implementation of new agricultural practices to boost yields, and burning of fossil fuels [3].

In contrast to these technological advances, very little attempt has been made towards keeping the natural order and balance of Nitrogen, or other elements in general. As a result of this misuse, there have been significant effects to the environment, such as nitrogen loss and subsequent eutrophication of coastal waters - where diversity of life is dying out due to the newly created hypoxic environments [3].

As indicated by this detrimental effect of humans on the Nitrogen cycle, humans are in fact capable of manipulating the cycling of elements on a global scale, just like the microbes. However, unlike the microbes, humans in general do not care for what state an element is in unless it is profitable to do so. This means that only the elements that we deem “useful” will be kept in supply, while others will be in whatever form or state is most convenient. This also means that the balance of elements, or preserving of the cycles, including the Nitrogen cycle, or any other cycle is of no concern at all. There is no respect, or “land ethic” that frowns upon the most efficient commercial processes that put elements out of balance and in disarray instead. Only the microbes are concerned about keeping a balance, since they are not motivated by profits, but rather, by survival. As they grow and adapt to niches, an equilibrium is achieved. If not for the microbes’ essential role in cycling these elements, we would already be out of Nitrogen and extinct.

An even more alarming example of element consumption is Carbon. The carbon cycle has a “leak” where a small amount of organic carbon sinks to the ocean floor and becomes part of sediment, and is unable to re-enter the cycle [4]. This leak, paired with the industrialization of the human race and burning of fossil fuels has resulted in massive emissions of carbon dioxide into the atmosphere, which could potentially result in a mass extinction within the next century [4]. In consideration of the coordination required to alter the both the nitrogen and carbon cycles (ie. All of humanity’s fertilization of crops, and burning fossil fuels), it is highly unlikely that the human race would be able to concert a coordinated effort towards innovation and preserving the balance of the different states of the elements until it is too late. If we are unable to sacrifice financial profit to replace even an aspect of the microbes role in element cycling to prevent extinction, it is impossible to even begin considering replacing the microbes role in cycling every single element on Earth.

Quality

Even if humans were somehow miraculously united and agreed to put all resources towards sustainable practices and research to maintain the balance of the mentioned elements, the complex roles that microbes play in an interconnected ecological web still remain to be understood. After all, how can we replace something if we don’t even know what it does, or how it works? Through trial and error, evolution has given us an order of things that is sustainable, yet we are throwing things out of balance, and not stopping. Additionally, in a naïve attempt to fix some of our wrongdoings, we have made things worse, due to a lack of understanding of the complexity involved with a problem. For example, to manage the invasive cane toads originally brought to Australia, the Lungworms were introduced, since the introduced cane toads were overrunning the native tree frog population [5]. Unfortunately, lungworms also ended up killing the tree frog numbers as well [5]. It is wholly possible that humans trying to fix the cycling of elements on a global scale by counterbalancing with another human-introduced intervention could actually make things worse than before, similar to the cane toad example. Additionally, microbes are also involved in their own unique communities and part of every single complex ecological web on earth. Trying to replace the microbes in these roles without a complete understanding would be a mistake, and a complete gamble. Everything is interconnected, and human research has not yet come to a mature enough level to even begin understanding the intricacies and nuances of the delicate balances in the world around us. In research, we are still only capable of culturing and studying only a small fraction of microbial life compared to the total diversity seen in nature, and still a far ways away from understanding all the interactions that the microbes have [6].

Conclusion

In short, humans are still a long way away from replacing microbial life, while microbial life is content to exist without humans, even if we happen to go extinct. Massive innovation and leaps in our understanding are required before consideration of replacing microbial life is even possible. This innovation would have to replace the numerous microbes, and be able capture the massive storage of information that is contained within microbes that is constantly expanding to account for the changing environment, and dynamic equilibrium state of the world’s elements. In addition to this, the complex interactions that the microbes are responsible for across different environments would also need to be understood before replacement or alteration of the sensitive ecological webs can occur. If microbial life were to suddenly disappear tomorrow, or even within the next few decades, humans would almost certainly face imminent extinction. Fortunately, microbes are fairly resilient, and despite our efforts towards extinction, they are here to stay.

References

  1. Whitman, W. B., Coleman, D. C., & Wiebe, W. J. (1998). Prokaryotes: the unseen majority. Proceedings of the National Academy of Sciences, 95(12), 6578-6583.
  2. Nisbet, E. G., & Sleep, N. H. (2001). The habitat and nature of early life. Nature, 409(6823), 1083-1091.
  3. Canfield D. E., Glazer A. N. & Falkowski P. G. (2010). The Evolution and Future of Earth’s Nitrogen Cycle. Science, 330(192), 192-196.
  4. Chu J. (2017, September 20). Mathematics predicts a sixth mass extinction. Retrieved February 16, 2018, from http://news.mit.edu/2017/mathematics-predicts-sixth-mass-extinction-0920
  5. Zak, E. (2017, February 27). 5 Invasive Species Humans Introduced on Purpose. Retrieved February 16, 2018, from https://www.care2.com/causes/5-invasive-species-humans-introduced-on-purpose.html
  6. Stewart E. J. (2012). Growing Unculturable Bacteria. Journal of Bacteriology, 194(16), 4151-4160.

Writing assignment Module 3

Prompt: “Discuss the challenges involved in defining a microbial species and how HGT complicates matters, especially in the context of the evolution and phylogenetic distribution of microbial metabolic pathways. Can you comment on how HGT influences the maintenance of global biogeochemical cycles through time? Finally, do you think it is necessary to have a clear definition of a microbial species? Why or why not?”

It is human nature to classify things and put things into neatly organized bins that reflect the nature and characteristics of the object, and for science and research, proper naming enables communication regarding the object in question. With organisms at a macroscopic scale, species can be defined and distinguished mostly based on whether they can interbreed. However, when it comes to microbes, there are various challenges involved in defining what a microbial species is. This is further complicated by horizontal gene transfer (HGT). In this essay, the challenges surrounding the definition of microbial species will be discussed further, along with the role of HGT in the maintenance of global biogeochemical cycles through time. Additionally, I will also discuss why it is necessary to be able to have a clear definition of what a microbial species is, even with all the challenges that come with it.

Some of the challenges for defining a microbial species comes from the fact that many of the existing methods in use for organisms at a larger scale simply do not apply to microbes, and that existing methods are only operational. For example, the existence of asexual organisms makes it difficult to use interbreeding as a classification. Even with the use of genetic similarity to define species, including DNA hybridization and rRNA gene similarity, definitions for the cut off between what defines a species is only operational and arbitrary, and may not be reflective of actual evolutionary processes and species labelling [1]. Using this method of defining a species, there is shown to be tension between the taxonomic label, and actual similarity. For example, in the work by Thompson et al, extensive diversity was found within a group of coastal bacterioplankton, with greater than 99% 16S ribosomal RNA identity to Vibrio splendidus [2]. Specifically, they examined Hsp60 and found that it varied greatly in genome size and even carbon substrates [2]. An even greater amount of diversity was found in the work by Welch et al, in Escherichia coli, where it was shown that strains were adapted to completely different ecological niches, and only 39.2% of their combined set of proteins were common to all three strains. Whether this sort of diversity should be allowed in a single species definition is also a challenge in defining a microbial species. However, with newer approaches, such as using amplicon sequence variants (ASVs), over operational taxonomic units (OTUs) may improve this labelling [1].

When defining a microbial species, horizontal gene transfer must also be considered, as HGT enables sharing of genes. This makes it difficult to examine evolution because evolution of microbial metabolic pathways, with HGT, may not occur linearly, as compared to other forms of events that could create genetic distance, such as mutations, insertions, and deletions. This increases the number of possible evolutionary pathways possible, and thus makes it more difficult to discern what the “true” evolutionary path is. While HGT may make phylogeny more difficult, it is an essential process in maintaining the global biogeochemical cycles. Entire metabolic pathways may be horizontally transferred, which protect the cycle, and through HGT, microbes are appropriately called the “guardians of metabolism” [3].

Even with all of the challenges in defining a microbial species, it is essential to have one. This is because definition of a species allows for communication surrounding the group of organisms, and ensures that there is a label that is consistent between different discussions. This allows integration and findings from other studies or work to be integrated into current and future work on the group of microbes, which is most important. It would be ideal that a labelled species indicates some sort of evolutionary origin, or biological function, but for now it is most important that the labels are consistent. Consistency is even more important when it comes to a clinical setting, so that doctors may communicate to each other in a standard vocabulary, and provide proper diagnosis and treatment.

In conclusion, while it is difficult to define a microbial species, and made even more complex with the existence of horizontal gene transfer, it is essential to have one so that discourse on the microbes can occur, and research may move forward with findings from the past. The current definition of microbial species is operational and arbitrary, and may not reflect accurately any evolutionary process or biological function, but most importantly it is consistent. Moving forward, if the definition of a microbial species does change, it would also be interesting to keep track of what organisms get moved to a different classification. As previously mentioned, consistency is most important, and if classifications do end up changing, it would also be important to consider “backwards compatibility” with the labels somehow, so that research from the past is still accessible. Hopefully, with improving methods of defining microbial species, it will be possible one day to have our definition encompass both consistency and biological meaning.

References

  1. Callahan B. J., McMurdie P. J., & Holmes S. P. (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME, 11, 2639-2643.
  2. Thompson J. R., Pacocha S., Pharino C., Klepac-Ceraj V., Hunt D. E., Benoit J., Sarma-Rupavtarm R., Distel D. L., & Polz M. F. (2005). Genotypic Diversity Within a natural Coastal Bacterioplankton population. Science, 307, 1311-1313.
  3. Falkowski P. G., Fenchel T. & Delong E. F. (2008). The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science, 320, 1034-1039.

Projects

Project_1

Abstract

To study the diversity and biochemical responses of microbial communities in the context of oxygen minimum zones (OMZs), Saanich Inlet was used as a model ecosystem from which water samples were collected at seven major depths spanning the oxycline. A metagenomic study was conducted in which genomic DNA was extracted from the water samples, PCR amplified, assembled into contiguous sequences, and processed into operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) using mothur and QIIME2 pipelines. Based on OTU and ASV results, we chose to focus in on Cyanobacteria as our taxon of interest. Both mothur and QIIME2 data produced alpha diversity values based on Shannon’s diversity index that suggest a decrease of Cyanobacterial abundance as depth increases. In addition, Cyanobacterial abundance significantly differs across depth and oxygen concentration according to both mothur and QIIME2 data; specifically, abundance significantly decreases at deeper depths and in environments with lower concentrations of oxygen. Within the Cyanobacteria phylum, there are 15 OTUs and 51 ASVs across all samples. Abundance of five OTUs from the mothur pipeline showed significant changes across both depth and oxygen. In contrast, abundance of none of the ASVs from the QIIME2 pipeline showed significant changes across depth, although 17 ASVs showed significant changes across oxygen concentrations. An increase of Cyanobacterial abundance at shallow depths may be explained in part by their ability to absorb red and orange light at the upper boundaries of the water column. This is also supported by the significant change in oxygen and chlorophyll A concentrations across the depth profile. Another explanation for low Cyanobacterial abundance at lower depths may be due to changes in temperature; where temperature drops below 10oC at 100m, Cyanobacterial growth stops completely.To study the diversity and biochemical responses of microbial communities in the context of oxygen minimum zones (OMZs), Saanich Inlet was used as a model ecosystem from which water samples were collected at seven major depths spanning the oxycline. A metagenomic study was conducted in which genomic DNA was extracted from the water samples, PCR amplified, assembled into contiguous sequences, and processed into operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) using mothur and QIIME2 pipelines. Based on OTU and ASV results, we chose to focus in on Cyanobacteria as our taxon of interest. Both mothur and QIIME2 data produced alpha diversity values based on Shannon’s diversity index that suggest a decrease of Cyanobacterial abundance as depth increases. In addition, Cyanobacterial abundance significantly differs across depth and oxygen concentration according to both mothur and QIIME2 data; specifically, abundance significantly decreases at deeper depths and in environments with lower concentrations of oxygen. Within the Cyanobacteria phylum, there are 15 OTUs and 51 ASVs across all samples. Abundance of five OTUs from the mothur pipeline showed significant changes across both depth and oxygen. In contrast, abundance of none of the ASVs from the QIIME2 pipeline showed significant changes across depth, although 17 ASVs showed significant changes across oxygen concentrations. An increase of Cyanobacterial abundance at shallow depths may be explained in part by their ability to absorb red and orange light at the upper boundaries of the water column. This is also supported by the significant change in oxygen and chlorophyll A concentrations across the depth profile. Another explanation for low Cyanobacterial abundance at lower depths may be due to changes in temperature; where temperature drops below 10oC at 100m, Cyanobacterial growth stops completely.

Introduction

Oxygen minimum zones (OMZs) are areas in the ocean where dissolved oxygen concentrations fall below 20 \(\mu\)M (1). Due to temperature increases and other effects caused by global warming, OMZs are expanding at a notable rate. Saanich Inlet, a seasonally anoxic fjord off the coast of British Columbia, is a model ecosystem for studying the diversity and biochemical responses of microbial communities to the hypoxic environments commonly observed in OMZs (1, 2). In particular, Saanich Inlet has been used to model the metabolic coupling and symbiotic interactions that occur in OMZs (3). The inlet undergoes recurring cycles of water column stratification and deep water renewal, rendering it a model ecosystem for studying microbial responses to changes in ocean deoxygenation levels (4). Increased levels of primary productivity in ocean surfaces during the spring season, as well as the limited mixing which occurs between the basin and surface waters both result in the development of an anoxic body of water with increasing depth in the Inlet (2). These anoxic regions become populated with chemolithoautotrophs, and eventually lead to a decrease in aerobically respiring organisms found deeper within these zones. Past studies have demonstrated that these kinds of metabolic shifts generally lead to a loss of nitrogen along with the production of greenhouse gases, most notably methane (CH4) and nitrous oxide (N2O) (1).

In order to investigate the changes that occur in these OMZs, water samples of various depths were collected from Saanich Inlet. Genomic DNA was extracted from these to conduct a metagenomics study, allowing to overcome the barrier of uncultivability of these samples and enable a more thorough exploration of the relationship which exists between the microbes and their communities based on genetic distribution of metabolic processes (5, 6, 7, 8). The extracted DNA is sequenced to generate raw data, which can then be assembled into contiguous sequences. These contigs generated by amplicon sequencing are then compared to a sequence database to determine the microbial taxa present in the environment at each water depth. This involves processing the sequencing data, and there currently exists two methods for this type of data analysis: operational taxonomic units (OTUs) and amplicon sequence variants (ASVs). OTU based pipelines work based on clustering reads which differ by less than a fixed dissimilarity threshold (9). This allows more data to be kept, although some may not be representative of the actual taxa in the community. On the contrary, ASV based pipelines resolve these sequence variants by inferring biological sequences in the sample prior to amplification and possible sequencing errors, and are able to distinguish variants which differ by even one nucleotide (9). This treats each ASV as individual species, though has the potential to discard more data and bias towards sequences that are less error-prone.

The objective of this paper is to analyze the data generated from both OTU and ASV pipelines in order to decide which produces more logical inferences, and ultimately determining which pipeline would be preferred to carry out future analysis of collected water samples. The taxonomy of interest which was selected for this comparison was the phylum Cyanobacteria. Cyanobacteria was selected as there are sufficient numbers of OTUs and ASVs to make sound comparisons, but not so much so that computation-wise it would be infeasible.

Methods

Sampling

Samples were obtained on Saanich Inlet Cruise 72 and taken from seven major depths spanning the oxycline: 10, 100, 120, 135, 150, 165 and 200 m. Waters were filtered, and genomic DNA was extracted. Further sampling details can be found in (3).

DNA Sequencing

Samples were PCR amplified using the 515F and 808R primers, then sequenced according to the standard operating protocol on an Illumina MiSeq platform with Phred33 quality scores.

Data Processing

Sequences were processed using either mothur or QIIME2 as follows:

Mothur Pipeline: mothur was used to clean-up the data. In brief, paired end reads were combined into contigs using their overlapping regions. Low quality sequences, useless sequence data, chimeric sequences and singletons were removed. OTUs were then determined at 97% similarity. OTUs were classified using the SILVA databases, and the taxonomies for each OTU were condensed. The OTU table, taxonomy data and sample metadata were subsequently cleaned up and combined into a phyloseq object.

QIIME2 Pipeline: Demultiplexed sequences were imported into QIIME as manifest reads. QIIME was used to clean up the data along with ASV determination in one step. Sequence quality was visually evaluated, and sequence quality trimming was conducted. All other trimming/filtering parameters were left as default. ASV determination was completed using the Dada2 protocol. ASV classification was completed using the Silva version 119 database at 99% similarity. The ASV table, taxonomy data and sample metadata were subsequently cleaned up and combined into a phyloseq object.

Data Analysis

The aforementioned phyloseq objects were imported into R version 3.4.3 (Windows) or 1.1.383 (Mac). The tidyverse, phyloseq, magrittr, knitr and cowplot packages were loaded and used to complete the data analysis. Data was piped into linear models and ANOVA tests to determine statistical significance at the 95% confidence level.

Environment setup and Data Cleaning
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract
## Warning: package 'cowplot' was built under R version 3.4.4
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
## 
##     ggsave
## You set `rngseed` to FALSE. Make sure you've set & recorded
##  the random seed of your session for reproducibility.
## See `?set.seed`
## ...
## 614OTUs were removed because they are no longer 
## present in any sample after random subsampling
## ...
## You set `rngseed` to FALSE. Make sure you've set & recorded
##  the random seed of your session for reproducibility.
## See `?set.seed`
## ...
## 6OTUs were removed because they are no longer 
## present in any sample after random subsampling
## ...

Results

  1. How does microbial community structure change with depth and oxygen concentration?

Alpha-diversity comparison:
Mothur
As shown in Figure 1, alpha-diversity by Shannon’s diversity index based on mothur shows an overall decreasing trend as depth increases. Specifically, a stable trend is observed from depths of 0-100m, starts decreasing at 100-150m and stabilizes at 150-200m. Figure 1 also shows an overall increasing trend as oxygen increases. Specifically, Shannon’s diversity index increases from 2.3-4.25 at oxygen concentrations of 0-40\(\mu\)M and decreases from 4.25-3.9 at oxygen concentrations of 40-220\(\mu\)M. It was also observed that Shannon’s diversity index is higher in oxic conditions (3.84 ± 0.45) than anoxic conditions (2.39 ± 0.07).

QIIME2
The trends of Shannon’s diversity index across depth and oxygen based on QIIME2 data are similar to those of mothur data. However, QIIME2 pipeline produces higher Shannon’s diversity index than mothur does. Shannon’s diversity index increases from 2.9-5.2 at oxygen concentrations of 0-40\(\mu\)M and decreases from 5.2-5.1 at oxygen concentrations of 40-200\(\mu\)M. Figure 2 shows that Shannon’s diversity index is higher in oxic conditions (4.80 ± 0.43) than anoxic conditions (3.15 ± 0.18).

Alpha-diversity of mothur data
## `geom_smooth()` using method = 'loess'

Table 1. Average and standard deviation of alpha-diversity by oxic/anoxic with mothur data
Statistic Oxic Anoxic
Average 3.8401008 2.3884700
Standard deviation 0.4523233 0.0666717
Alpha-diversity of QIIME2 data
## `geom_smooth()` using method = 'loess'

Table 2. Average and standard deviation of alpha-diversity by oxic/anoxic with QIIME2 data
Statistic Oxic Anoxic
Average 4.7959464 3.1546427
Standard deviation 0.4296851 0.1784873

Taxa presence and abundance:
Mothur
31 taxa in the phylum level are detected by mothur pipeline (Figure 3). These taxa, however, have abundance at different magnitudes. Among of them, Proteobacteria is the most predominant phylum in all samples with the highest average abundance over 75. On the contrary, phylum Peregrinlbacteria has abundance no larger than 0.001 in the seven samples (Figure 3). Additionally, different taxa have distinct changes in abundance across depth. For instance, both Thaumarchaeota and Verrucomicrobia reach their maximum abundance at depth of 100m and and gradually decrease in abundance until 200m, while Latescibacteria and Fibrobacteres are almost undetectable in shallow water and their abundances increase dramatically at depth of 200m.

QIIME2
QIIME2 pipeline detects 29 known taxa and unknown taxa in phylum level (Figure 4). Proteobacteria is still the most abundant phylum across samples. Taxa Chlorobi and Candidate division OP3 have the smallest abundance no larger than 0.004. Different pipelines may result in different changes in abundance for the common taxa shared by mothur and QIIME2 data. Although the change patterns of phylum Actinobacteria in abundance across depth are the same in both datasets, Chloroflexi abundance increases gradually with depth in QIIME2 data different from its change pattern in mothur data, in which abundance declines at depth from 100m to 135m and increases gradually until 200m.

  1. Does your taxon of interest significantly differ in abundance with depth and/or oxygen concentration?

Mothur
The difference in cyanobacteria abundance within depth or oxygen was estimated by the linear model using mothur processed data. The statistical results show that abundance of cyanobacteria is significantly different with depth (p = 0.01263) and oxygen (p = 0.00012). However linear models in Figure 5 indicate completely distinct trends of cyanobacteria abundance across depth and oxygen, where there is a decrease in abundance as depth increases and an increase in abundance as oxygen concentrations increases respectively.

QIIME2
For data processed by QIIME2 pipeline, ANOVA tests indicate cyanobacteria abundance in the seven samples has significantly difference across depth (p = 0.014) and oxygen (p = 0.013).The linear models in Figure 6 reveal that cyanobacteria abundance decreases at deeper water or in the environment with insufficient oxygen.

## 
## Call:
## lm(formula = Abundance ~ Depth_m, data = .)
## 
## Residuals:
##       1       2       6       4       5       3       7 
##  46.352 -50.700  20.041 -16.609  -3.284 -39.933  44.132 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 173.5309    39.3958   4.405  0.00699 **
## Depth_m      -1.0883     0.2864  -3.800  0.01263 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.31 on 5 degrees of freedom
## Multiple R-squared:  0.7428, Adjusted R-squared:  0.6914 
## F-statistic: 14.44 on 1 and 5 DF,  p-value: 0.01263
## 
## Call:
## lm(formula = Abundance ~ O2_uM, data = .)
## 
## Residuals:
##       1       2       6       4       5       3       7 
##   6.768 -17.048  19.375  -4.217  12.375 -22.627   5.375 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -5.3745     7.4690   -0.72 0.504007    
## O2_uM         0.9582     0.0885   10.83 0.000117 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.87 on 5 degrees of freedom
## Multiple R-squared:  0.9591, Adjusted R-squared:  0.9509 
## F-statistic: 117.2 on 1 and 5 DF,  p-value: 0.0001167

## 
## Call:
## lm(formula = Abundance ~ Depth_m, data = .)
## 
## Residuals:
##        1        4        5        3        2        6        7 
##   63.270  160.404   72.980  -30.172 -221.273  -44.444   -0.766 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 687.7807   122.7658   5.602   0.0025 **
## Depth_m      -3.3051     0.8925  -3.703   0.0140 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 131.8 on 5 degrees of freedom
## Multiple R-squared:  0.7328, Adjusted R-squared:  0.6794 
## F-statistic: 13.71 on 1 and 5 DF,  p-value: 0.01395
## 
## Call:
## lm(formula = Abundance ~ O2_uM, data = .)
## 
## Residuals:
##         1         4         5         3         2         6         7 
##    0.5171  190.2269  105.9213   18.5371 -121.0450  -61.0787 -133.0787 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 159.0787    57.3201   2.775   0.0391 *
## O2_uM         2.5772     0.6792   3.794   0.0127 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 129.5 on 5 degrees of freedom
## Multiple R-squared:  0.7422, Adjusted R-squared:  0.6907 
## F-statistic:  14.4 on 1 and 5 DF,  p-value: 0.0127

  1. Within your taxon, what is the richness (number of OTUs/ASVs)?

Mothur
Across all samples, there are 15 OTUs within cyanobacteria phylum. Table 3 presents the numbers of OTUs within cyanobacteria phylum for each sample. Most of samples contain 3-5 OTUs within cyanobacteria, except Saanich_120 and Saanich_200 which only have 1 and 0 cyanobacteria OTUs respectively.

QIIME2
There are 51 ASVs within cyanobacteria phylum across all samples. The number of ASVs within cyanobacteria phylum for each sample is shown in Table 2. Saanich_010, Saanich 120, and Saanich_135 have relatively high ASV number equal to or over 15. However, it is important to note that there are no singletons within the ASV dataset, and the function used to estimate richness, “estimate_richness” from the phyloseq library is highly dependent on the number of singletons, and warns of unreliable or wrong results in the absence of singletons in the data.

Table showing richness of OTUs using mothur data and ASVs using QIIME2 data
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 15 taxa and 7 samples ]
## sample_data() Sample Data:       [ 7 samples by 22 sample variables ]
## tax_table()   Taxonomy Table:    [ 15 taxa by 7 taxonomic ranks ]
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 51 taxa and 7 samples ]
## sample_data() Sample Data:       [ 7 samples by 22 sample variables ]
## tax_table()   Taxonomy Table:    [ 51 taxa by 7 taxonomic ranks ]
## Warning in estimate_richness(., measures = c("Observed")): The data you have provided does not have
## any singletons. This is highly suspicious. Results of richness
## estimates (for example) are probably unreliable, or wrong, if you have already
## trimmed low-abundance taxa from the data.
## 
## We recommended that you find the un-trimmed data and retry.
Table 3. OTUs/ASVs across depth
Depth_m OTU ASV
Saanich_010 10 5 17
Saanich_100 100 4 8
Saanich_120 120 1 15
Saanich_135 135 4 17
Saanich_150 150 3 11
Saanich_165 165 3 5
Saanich_200 200 0 2
  1. Do the abundances of OTUs/ASVs within your taxon of interest change significantly with depth and/or oxygen concentration?

Using the linear model for statistical interpretation, after correcting the p-value for multiple comparisons, the abundance of OTUs 0189, 0658, 1104, 3852, and 4312 from the mothur pipeline within the cyanobacteria phylum changed significantly with both depth and oxygen. Interestingly, the abundance of ASVs in the QIIME2 pipeline did not have any significant changes across the depth profiles after correcting for the p-value. However, there were 17 ASVs that had a significant abundance change with respect to oxygen concentrations.

Mothur
Table 4. Correlation data of Cyanobacteria OTUs with significant differences across depth using mothur data
Estimate Std. Error t value P_value Adjusted_P
Otu0189 -0.9610475 0.2669442 -3.600181 0.0155403 0.0491962
Otu0658 -0.0685106 0.0190455 -3.597207 0.0155891 0.0491962
Otu1104 -0.0583306 0.0164341 -3.549356 0.0163987 0.0491962
Otu3852 -0.0159083 0.0044820 -3.549356 0.0163987 0.0491962
Otu4312 -0.0053028 0.0014940 -3.549356 0.0163987 0.0491962

Table 5. Correlation data of Cyanobacteria OTUs with significant differences across oxygen using mothur data
Estimate Std. Error t value P_value Adjusted_P
Otu0189 0.8586534 0.0788686 10.88714 0.0001136 0.0004035
Otu0658 0.0611354 0.0058159 10.51177 0.0001345 0.0004035
Otu1104 0.0522766 0.0049096 10.64789 0.0001264 0.0004035
Otu3852 0.0142572 0.0013390 10.64789 0.0001264 0.0004035
Otu4312 0.0047524 0.0004463 10.64789 0.0001264 0.0004035

QIIME2
## [1] "None of ASV has significantly different abundance acrossing depth with QIIME2 data"

Table 6. Correlation data of Cyanobacteria ASVs with significant differences across oxygen using QIIME2 data
Estimate Std. Error t value P_value Adjusted_P
Asv12 0.0855435 0.0080338 10.647891 0.0001264 0.0004605
Asv144 0.1568297 0.0147287 10.647891 0.0001264 0.0004605
Asv294 0.4104610 0.0429693 9.552421 0.0002128 0.0006784
Asv404 0.1948490 0.0182993 10.647891 0.0001264 0.0004605
Asv663 0.9749372 0.0956990 10.187539 0.0001564 0.0005316
Asv790 0.0095048 0.0008926 10.647891 0.0001264 0.0004605
Asv945 0.0950483 0.0089265 10.647891 0.0001264 0.0004605
Asv1055 0.0380193 0.0035706 10.647891 0.0001264 0.0004605
Asv1085 0.1710870 0.0160677 10.647891 0.0001264 0.0004605
Asv1141 0.0665338 0.0062485 10.647891 0.0001264 0.0004605
Asv1209 0.0285145 0.0026779 10.647891 0.0001264 0.0004605
Asv1454 0.1283152 0.0120508 10.647891 0.0001264 0.0004605
Asv1578 0.0285145 0.0026779 10.647891 0.0001264 0.0004605
Asv1728 0.2281160 0.0214236 10.647891 0.0001264 0.0004605
Asv1817 0.2946498 0.0276721 10.647891 0.0001264 0.0004605
Asv2018 0.3390505 0.0720444 4.706133 0.0053079 0.0159238
Asv2336 0.1045531 0.0098191 10.647891 0.0001264 0.0004605

  1. Are the answers to the above the same using mothur and QIIME2 processed data?

In terms of differences between the two pipelines, the overall trend between both datasets were similar, however the statistical interpretations of the datasets varied. For starters, the Shannon diversity for the whole microbial community processed with mothur was generally smaller than that estimated by QIIME 2 processed data. Additionally, ANOVA tests indicated that the Shannon diversity had no significant change across depth from the mothur pipeline (p = 0.054). In contrast, there was statistical significance with the Shannon diversity across depth from the QIIME2 dataset (p = 0.022). Within the Cyanobacterial taxon, there were 51 ASVs calculated from the QIIME2 pipeline compared to 15 OTUs from the Mothur pipeline. Interestingly, while there is a difference in the richness between both datasets, 5 of the OTUs had a significantly different abundance with respect to depth and oxygen profiles, while none of the ASVs had a significant change in abundance across depth and 17 ASVs which had a significant difference in abundance across oxygen concentrations. Despite all these variations, the Cyanobacterial taxon itself had a significant difference in abundance with respect to depth and oxygen profiles along the water column across both pipelines.

ANOVA on alpha-diversity of mothur data across depth
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Depth_m      1  2.355  2.3554   6.265 0.0543 .
## Residuals    5  1.880  0.3759                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ANOVA on alpha-diversity of QIIME2 data across depth
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Depth_m      1  3.563   3.563   10.65 0.0224 *
## Residuals    5  1.673   0.335                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Discussion

The significant differences for Cyanobacteria abundance across depth and oxygen may be caused by photosynthesis and water temperature. Bacteria of the phylum Cyanobacteria obtain their energy through photosynthesis. These phototrophs are characterized by phycocyanin, a bluish pigment, which functions as an auxiliary light-harvesting protein complex to chlorophyll. Phycocyanin absorbs orange and red light at approximately 620nm and fluoresces at about 650nm depending on the species (10). This is particularly interesting because orange and red light are typically absorbed within the first 50m of a water column (11). From our data in Fig 5 & 6, it was found that there are significant differences in the abundance of Cyanobacteria across the Saanich Inlet depth profiles in both the mothur and QIIME2 pipelines. It is likely that this significance exists as a result of Cyanobacteria thriving at the upper boundaries of the water column, where they are still able to absorb red and orange light that is essential for Cyanobacterial photosynthesis. Furthermore, these findings are supported by the oxygen and fluorescence profiles across the water column (Fig 11). There is a significant difference in the concentration of oxygen and chlorophyll A across depth profiles. Higher concentrations of chlorophyll A and oxygen were found within the top 50m, which indicates increased photosynthetic activity. Moreover, there was a significant difference in the abundance of Cyanobacteria across oxygen concentrations and chlorophyll A concentrations for both pipelines; high oxygen and chlorophyll A concentrations were associated with a larger abundance. With Cyanobacterial photosynthesis contributing a substantial proportion of oxygen to Earth’s atmosphere, it is not surprising that a high abundance of Cyanobacteria is associated with a high concentration of oxygen and chlorophyll A, and a shallow depth. Moreover, it has been reported that the growth rate of cyanobacteria is significantly influenced by temperature. Cyanobacteria were observed to have a lower growth rate at colder temperatures. For marine cyanobacteria, the optimal growth temperature range is 20 - 27.5oC; at these temperatures, cyanobacteria grow at a rate of 0.8 d-1 (12). When the temperature dropped to approximately 15oC, the growth rate of cyanobacteria slowed to 0.22 d-1. Interestingly, when temperature dropped below 10oC, cyanobacterial growth rates came to a complete stop (13). Therefore, in our study, the decline of cyanobacteria abundance with depth may at least partly attributed to the decreasing temperature. According temperature data for each sample, the temperature is close to 13oC at 10m, and decreases to about 9oC when at depths below 100m. Hence, the growth rate of cyanobacteria at depths below 100m is substantially slower than the growth rate at the water’s surface. This leads to a lower abundance at lower depths.

Implications of potential differences in pipelines for microbial ecology make it difficult to make conclusive statements in research and discovery, since we become unable to differentiate actual biological differences seen and differences due to a particular pipeline being used. This could also suggest that one pipeline is more suited to the dataset. In fact, this difference could also be exploited, and manipulated so that a pipeline is selected based on the results that it gives, rather than the more appropriate pipeline for the given dataset.

In the context of this project, the main difference between the two pipelines is based on whether the pipeline produces OTUs (mothur) or ASVs (QIIME2). Both of them use different clustering algorithms to determine “true” sequences, and as a result, there are usually far more ASVs in the QIIME2 pipeline than OTUs created with the mothur pipeline. Therefore, when doing downstream analysis of the pipeline results, this may be one of the reasons why there is a large disparity in between the numbers of sequences and quality of the sequences produced even when using the same initial data.

However, when it came to counting abundances within our taxon, cyanobacteria, there was a far lower abundance seen in qiime2 results than mothur results which cannot be explained by having more numerous ASVs than OTUs. Interestingly, when running the estimate_richness function on our qiime2 data for determining abundance, this resulted in the warning that our data provided did not have any singletons (supposedly in the output), and that results are probably unreliable or wrong. This error did not occur with running this function on mothur data. Further analysis of why this error is seen with qiime2 data should be done before fully trusting the results of this function with qiime2 data.

In subsequent analyses with these two pipelines, perhaps a more in-depth analysis of each function available with the phyloseq package should be tested. A dataset of a well-studied and known community could be used so that the output of the functions can be compared with respect to the pipeline used. Comparisons can be made for the outputs of each pipeline, and results from the mothur and QIIME2 pipelines can be assessed with reference to the expected results. Use cases should also be considered, and standard use cases for either pipeline should be indicated so exploitation of pipelines for favourable results does not occur. Mothur may be more suited to a particular dataset, while QIIME2 could be more appropriate for another dataset that should be used with a “denoising” algorithm.

Future directions for this project could possibly involve the analysis of water samples from other OMZs at various depths, along with observation of the Cyanobacteria data present in those samples to see if the relationships are exhibited between oxygen, nitrogen, and the phyla population as with this study. These can also be analyzed once again with both mothur and QIIME2 for further comparison between the pipelines.

References

  1. Walsh DA, Zaikova E, Howes CG, Song YC, Wright JJ, Tringe SG, Hallam SJ. 2009. Metagenome of a versatile chemolithoautotroph from expanding oceanic dead zones. Science 326(5952): 578-582.
  2. Torres-Beltr?n M, Hawley AK, Capelle D, Zaikova E, Walsh DA., Mueller A, Finke J. 2017. A compendium of geochemical information from the Saanich Inlet water column. Nature scientific data 4(170159).
  3. Hawley AK, Torres-Beltr?n M, Zaikova E, Walsh DA, Mueller A, Scofield M, Kheirandish S, Payne C, Pakhomova L, Bhatia M, Shevchek O, Gies EA, Fairley D, Malfatii SA, Norbeck AD, Brewer HM, Pasa-Tolic, L, del Rio TG, Suttle CA, Trige S, Hallam SJ. Data Descriptor: A compendium of multi-omic sequence information from the Saanich Inlet water column. Nature scientific data 4(170160).
  4. Hallam SJ, Torres-Beltr?n M, Hawley AK. Comment: Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Nature scientific data 4(170158).
  5. National Research Council. 2007. The new science of metagenomics: revealing the secrets of our microbial planet. National Academies Press, Washington, DC.
  6. Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy TBK, Tringe SG. 2017. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nature Biotechnology 35: 725-731.
  7. Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive Earth’s biogeochemical cycles. Science 320(5879): 1034-1039.
  8. Wooley JC, Godzik A, Friedberg I. 2010. A primer on metagenomics. PLoS computational biology 6(2), e1000667.
  9. Callahan BJ, McMurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME journal 11(12): 2639-2643.
  10. Simis SG, Huot Y, Babin M, Seppala J, and Metsamaa L. 2012. Optimization of variable fluorescence measurements of phytoplankton communities with cyanobacteria. Photosynthesis research 112(1): 13-30.
  11. Light in the ocean (n.d.) [Online]. Link: https://manoa.hawaii.edu/exploringourfluidearth/physical/ocean-depths/light-ocean.
  12. Boyd PW, RynearsonTA, Armstrong EA, Fu F, Hayashi K, Hu Z, Hutchins DA, Kudela RM, Litchman E, Mulholland MR, Passow U, Strzepek RF, Whittaker KA, Yu E, and Thomas MK. (2013). Marine phytoplankton temperature versus growth responses from polar to tropical waters - outcome of a scientific community-wide study. PLoS ONE 8(5): e63091.
  13. Berg M, Sutula M. 2015. Factors affecting the growth of cyanobacteria with special emphasis on the Sacramento-San Joaquin Delta. Southern California Coastal Water Research Project Technical Report 869.

Project_2

Abstract

Genes encoding key steps in the nitrogen cycle are well defined and provide a basis for functional anchor screening to determine their distribution across prokaryotic taxa. In this study, Saanich Inlet was used as a model ecosystem to investigate the metabolic coupling and symbiotic interactions that influence the nitrogen cycle in oxygen minimum zones. Water samples were collected at seven major depths spanning the oxycline, and genomic DNA and RNA were extracted and sequenced. The resulting reads were processed, assembled, and analyzed using the Tree-based Sensitive and Accurate Protein Profiler (TreeSAPP) pipeline to reconstruct the nitrogen cycle along defined redox gradients in Saanich Inlet. The narG gene was then investigated in detail. Its DNA levels were found to increase with depth while RNA levels decreased with depth. Proteobacteria contributed most narG DNA, while Actinobacteria contributed most RNA. Further, narG RNA abundance increased along with nitrite concentrations in the Inlet, but had the opposite relationship with nitrate concentrations; this was the expected result given that narG mediates the conversion of nitrate to nitrite. Overall, narG is responsible for a conversion taking place the beginning of the nitrogen cycle, and this provides reasoning for its presence at the given depths, while providing a key piece to exemplify evolutionary and environmental reasoning for the distributed metabolism seen in the nitrogen cycle.

Introduction

Oxygen minimum zones (OMZs) are regions in the ocean where dissolved oxygen concentrations fall below 20 \(\mu\)M (1). Due to global temperature increases and other effects caused by climate change, OMZs are increasing at a significant rate. Saanich Inlet is a seasonally anoxic fjord off the coast of British Columbia and a model ecosystem for studying the diversity and biochemical responses of microbial communities to the hypoxic environments commonly observed in OMZs (1, 2). In particular, Saanich Inlet has been used to model the metabolic coupling and symbiotic interactions that occur in OMZs (3). The inlet undergoes recurring cycles of water column stratification and deep water renewal, making it a useful model for studying microbial responses to changes in oceanic oxygenation levels (4). Increased levels of primary productivity in ocean surfaces during the spring season, along with the limited mixing which occurs between the basin and surface waters, both result in the development of an anoxic body of water with increasing depth in the Inlet (2). These anoxic regions become highly populated with chemolithoautotrophs, eventually leading to a decrease in aerobically respiring organisms found deeper within these zones. Past studies have shown that these types of metabolic shifts usually lead to a loss of nitrogen along with the production of notable greenhouse gases such as methane (CH4) and nitrous oxide (N2O) (1). Some species have also been shown to engage in sulfide (H2S) oxidation and nitrate (NO3-) reduction pathways within these zones (5).

The nitrogen cycle–the biogeochemical cycle by which nitrogenous compounds are interconverted between chemical forms for environmental circulation–consists mainly of nitrogen fixation, nitrification and denitrification, and has been catalyzed by microorganisms long before the appearance of humans (6). Even today, much of it is still dictated by the diverse microbial niches present in the surrounding environment. An example of this control can be seen in denitrification–the conversion of NO3- to nitrogen gas (N2)–carried out by denitrifying bacteria. Denitrication is a highly important process as it prevents the accumulation of nitrogen compounds to toxic levels that could lead to eutrophication, and maintains the homeostasis of nitrogen distribution between soil and atmosphere. Having an atmospheric residence time of approximately 1 billion years, nitrogen gas (N2) is highly inert and is not accessible for the synthesis of proteins and nucleic acids, though this problem is easily solved by the conversion of N2 to NH4+ through nitrogen fixation (6).

Recent literature has highlighted that microbial taxa do not necessarily integrate whole elemental cycles into their individual metabolic pathways, but often divide these reactions among the community so that consortia of taxa must rely on each other to fulfill their metabolic requirements (2). The aim of this project was to characterize the involvement of our gene of interest, narG, which is known for its role in the reduction of NO3- to NO2-, in the denitrification cycle.

Methods

Water samples were collected from Saanich Inlet at depths 10, 100, 120, 135, 150, 165, and 200m spanning the oxycline during Saanich Inlet Cruise 72. The samples were subsequently filtered through a 0.22 \(\mu\)m Sterivex filter to collect biomass and both genomic DNA and RNA were extracted from these samples. The extracted RNA was converted into cDNA. Both total genomic DNA and cDNA were used to construct the shotgun Illumina libraries. Sequencing data were generated on the Illumina HiSeq platform with 2x150bp technology. Further sampling and sequencing details can be found in A compendium of multi-omic sequence information from the Saanich Inlet water column by Hawkley et al. (2017).

The IMG/M pipeline was then used to process the resulting reads. Metapathways 2.5 was used to assemble and process the metagenomes. A beta version of Tree-based Sensitive and Accurate Protein Profiler (TreeSAPP) pipeline was used to reconstruct the nitrogen cycle along defined redox gradients in Saanich Inlet using Google Cloud services. iTOL version 4.0 was used to generate phylogenetic trees based on the processed data. The contig maps were loaded into R version 3.4.3. The Tidyverse, Cowplot and Phyloseq packages were loaded and used to complete the data analysis.

Results

Environment setup and Data Cleaning

Loading, parsing (initial data contains data for all genes - we are only interested in narG), renaming data. We only require the taxonomy, abundance, and query data.

Data manipulation into a single data frame

## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 8302
## rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
## 20, ...].

Warning above were ignored, because not all queries could be classified down to species level. These cells are filled as NA

Final data table
Table 1. Final data of narG gene
1. How does the DNA abundance of narG gene differ with depth?

Figure 1 indicates that the DNA abundance of the narG gene has an overall increasing trend with depth. The DNA abundance of the narG gene increases gradually from 21.40 to 990.72 at depth from 10 m to 150 m. However, it declines to 249.21 at depth of 165 m, but increases again to 717.90 at depth of 200 m.

2. What taxa are responsible for narG? Are they the same with depth? With DNA versus RNA?

RNA abundance of the narG gene with depth does not match the trend of DNA abundance. Figure 2 shows that RNA abundance of the narG gene increases significantly from 10 m to 100 m, and reaches its second highest level of 3585.45 at 100 m. Interestingly, the RNA abundance fluctuates between a range of 2500-3700 at depths 100-135 m. After that, it decreases gradually until 200 m. On the other hand, RNA abundance of narG gene is much larger than the DNA abundance at depth of 100-165 m, while less narG RNA is found in the samples from depth of 10 m and 200 m.

3. What taxa are responsible for narG gene? Are they the same with depth? With DNA versus RNA?

As shown in Figure 3, Actinobacteria, candidatus Omnitrophica, Chlorobi, Euryarchaeota, metagenomes, Proteobacteria, and unclassified Bacteria are responsible for narG DNA. Among of them, Actinobacteria, candidatus Omnitrophica, Chlorobi, Euryarchaeota, and metagenomes have evenly distributed DNA abundance across depth above 100 m. However, the distribution of Proteobacterial DNA abundance with depth follows a similar trend with that of narG DNA abundance, in which abundance increases from depth of 10 m to 150 m, declines at depth of 150-165 m, and finally increases again at 200 m. On the contrary, DNA abundance of unclassified Bacteria is the same at depth below 165 m but increases slightly at 200 m.

Actinobacteria, Proteobacteria, and unclassified Bacteria make dominant contributions to narG RNA among the 11 phyla. Actinobacteria is the predominant phylum for narG RNA contribution at depth of 100 m and 120 m, but its RNA abundance decreases gradually at deeper water and eventually becomes undetectable at depth of 200 m. Both Proteobacteria and unclassified Bacteria have observed RNA at the seven different depth, although their RNA abundances are not distributed in the same trend with depth. Proteobacterial RNA abundances at depth of 150 m and 200 m are higher than that from other depths, while unclassified Bacteria have the lowest RNA abundance at 150 m and 200 m. Additionally, unclassified Bacteria have RNA abundance increasing with depth at shallow water and reaching the highest point at depth of 135 m.

## Warning: Removed 7265 rows containing missing values (geom_point).

4. How does the abundance of narG gene relate to nitrogen species in Saanich?

The narG gene has significant impacts on nitrite (NO2-) and nitrate (NO3-) concentration across depth in Saanich. It is responsible for the reduction of nitrate to nitrite. As a result, when the RNA abundance of narG gene remains at a high level at depths 100-135m (Figure 2), nitrate concentration stops increasing and declines dramatically from its highest point at depth of 100 m (Figure 4D). On the other hand, due to poor nitrification in deeper water, although RNA of narG gene is less abundant at depth above 150 m, nitrate concentration continues decreasing with depth until to 0 at depth of 165 m. Similarly, narG is also one of the key factors for the fluctuation of nitrite concentration at depth 100-165 m. The low ammonium concentration in depth of 10-135 m indicates a high level of nitrification in the shallow water (Figure 4A), which is responsible for the significant increase of nitrate and the decrease of nitrite at depth 10-100 m. However, nitrite concentration does not keep decreasing with depth above 100m, instead, it remains around 0.09 \(\mu\)M as a result of high nitrate reduction level caused by the abundant RNA of narG gene. Therefore, narG gene abundance is closely related to nitrite and nitrate concentration in Saanich.

Discussion

In anthropogenic times, NO3- levels have significantly increased in environmental water systems resulting from agricultural activities (7). With increases in NO3- entering these systems, the accumulation of nitrogen compounds above threshold levels can lead to eutrophication. Microbe-driven processes such as denitrification, are able to counteract this phenomenon by converting NO3- to N2. In this context, our gene of interest, narG, specifically mediates the first step of this pathway. As the alpha subunit and catalytic domain of the membrane-bound dissimilatory nitrate reductase, narG is responsible for the conversion of NO3- to NO2-. NO2- will become the substrate of nirK and nirS, and through subsequent steps and enzymes, eventually lead to the production of N2 (8). Interestingly, it has been found that the process of transporting nitrate into the cytoplasmic location of the active site of NarG is inhibited by oxygen, making denitrification an anaerobic process (9). As a result, it is expected that there is greater amounts of narG transcription in OMZs compared to surface level waters. From figure 2, peak transcription of narG is found at 100 m where oxygen levels are approaching OMZ conditions (Figure 5). Additionally, the large abundance of narG transcription at 100 m and deeper can be explained with the corresponding abundance of NO3- substrate. Many denitrifying facultative anaerobic microorganisms respire NO3- in the absence of oxygen. Denitrification is considered to be the highest energy-yielding respiration system in anoxic environments (10). Increases in NO3- at anoxic depth leads to the reduction of nitrate to nitrite in the cytoplasm by narG. It has been found that this process is coupled to the translocation of protons into the periplasm which directly contributes to a proton-motive force for energy conservation (8). Therefore, it is not surprising that narG transcription levels are near absent at oxic depths where oxygen can still be used as the preferential terminal electron acceptor (TEA), and abundant at anoxic depths where NO3- can be used as an alternative TEA.

Metabolisms in the nitrogen cycle is widely distributed, such that each reaction step is catalyzed by a specific enzyme from a different bacterial species carrying the corresponding genes. This distribution may be the consequence of the distinct energy and nutrient sources which are available in different ecosystems, as well as other mechanisms, namely horizontal gene transfer, and adaptive gene loss.

In aerobic environments like shallow water and topsoil, nitrification is the predominant process of the nitrogen cycle, due to the availability of ammonia and nitrite as energy sources for nitrifying bacteria and the inhibitive effect of oxygen on denitrifiers (11,12). On the contrary, denitrification occurs in anoxic environments including OMZs and marine sediments, where nitrate serves as energy and nutrient source and the limitation of oxygen on denitrification is minimum (8). Besides, some bacteria evolve additional mechanisms to protect nitrogen-related enzymes from unfavorable environments. For example, Trichodesmium spp. and Nodularia spp. adopt the strategy that nitrogen fixation is separated from photosynthesis, in order to minimize oxygen suppression on nitrogenases, an enzyme for nitrogen fixation (8). Therefore, environmental conditions is one of the possible reason for the distributed metabolisms.

Another potential explanation for metabolic distribution of the nitrogen cycle is horizontal gene transfer (HGT). Bacteria may acquire nitrogen-related genes from other species through HGT and take up the niche of corresponding reactions in the nitrogen cycle. For instance, ammonia monooxygenase genes, encoding the key enzyme required for ammonia oxidation, are observed widely distributed among different bacterial species (6). As a result, if a certain bacterial species responsible for ammonia oxidation became extinct in a specific ecosystem, other species, carrying ammonia monooxygenase genes and able to perform the process of ammonia oxidation, would take up the niche of the extinct one. On the other hand, the retention of horizontally transferred gene is largely driven by selective pressure on nutrients and energy, and thus different processes in the nitrogen cycle are widely distributed among microorganisms and ecosystems (6).

Gene loss may also make contributions to the wide distribution of metabolisms in the nitrogen cycle. A hypothesis, called as Black Queen Hypothesis (BQH), is proposed recently to explain the dependency of free-living microorganisms having metabolic gene loss on other species (13). In this hypothesis, two groups of bacterial species are defined: “helper” and “beneficiaries”. The hypothesis predicts that in a microbial community, “beneficiaries”, which is selected to lose a costly and dispensable functions, will obtain the function from a fraction of other individuals in the community, until the function is sufficient to support the whole community (13). Based on the hypothesis, it can be explained that why nitrogen fixation is performed by a small fraction of microorganisms in the ocean. The possible reasons may be that genes responsible for nitrogen fixation were selectively lost during evolution and fixed N was turned into a public good (13).

This study paves the way for future investigations into metabolic coupling and symbiotic interactions in the nitrogen cycle. It appears that narG levels are regulated by oxygen levels and availability of the NO3- substrate; however, the effect of other environmental factors has yet to be related with gene abundance. According to Palmer et al., in sediment communities, Actinobacterial narG dominates in extreme environments characterized by low pH or high temperature, while Proteobacterial narG is present in pH-neutral soils (15). Given that Actinobacteria and Proteobacteria dominated DNA and transcript expression of narG in an aquatic context, it would be useful to determine whether these abundances are due to pH, temperature, or other measurable geochemical parameters.

Additionally, as narG is inhibited by its own product, nitrite, it may be interesting to see how narG abundances vary with nirK and nirS abundances, which consume nitrite. Interestingly, previous studies have noted that nirS was dominated by \(\beta\)-Proteobacteria, while nirK was dominated by \(\alpha\)-Proteobacteria (14). Our study results also suggest narG is dominated in part by Proteobacteria, although more in depth analysis into different classes of Proteobacteria have not been attempted. A deeper analysis into the abundance of narG in different classes of Proteobacteria may be necessary to establish a correlation between how the abundances of nirK, nirS, and narG vary within the Proteobacteria phylum.

References

  1. Walsh DA, Zaikova E, Howes CG, Song YC, Wright JJ, Tringe SG, Hallam SJ. 2009. Metagenome of a versatile chemolithoautotroph from expanding oceanic dead zones. Science 326(5952): 578-582.
  2. Torres-Beltran M, Hawley AK, Capelle D, Zaikova E, Walsh DA., Mueller A, Finke J. 2017. A compendium of geochemical information from the Saanich Inlet water column. Nature scientific data 4(170159).
  3. Hawley AK, Torres-Beltran M, Zaikova E, Walsh DA, Mueller A, Scofield M, Kheirandish S, Payne C, Pakhomova L, Bhatia M, Shevchek O, Gies EA, Fairley D, Malfatii SA, Norbeck AD, Brewer HM, Pasa-Tolic, L, del Rio TG, Suttle CA, Trige S, Hallam SJ. Data Descriptor: A compendium of multi-omic sequence information from the Saanich Inlet water column. Nature scientific data 4(170160).
  4. Hallam SJ, Torres-Beltran M, Hawley AK. Comment: Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Nature scientific data 4(170158).
  5. National Research Council. 2007. The new science of metagenomics: revealing the secrets of our microbial planet. National Academies Press, Washington, DC.
  6. Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive Earth’s biogeochemical cycles. Science 320(5879): 1034-1039.
  7. Smith CJ, Nedwell DB, Dong LF, Osborn AM. 2007. Diversity and Abundance of Nitrate Reductase Genes (narG and napA), Nitrite Reductase Genes (nirS and nrfA), and Their Transcripts in Estuarine Sediments. Applied and environmental microbiology, 73(11): 3612-3622.
  8. Kuypers MM., Marchant, HK, Kartal B. (2018). The microbial nitrogen-cycling network. Nature Reviews Microbiology.
  9. Moreno-Vivian C, Cabello P, Martinez-Luque M, Blasco R, Castillo F. (1999). Prokaryotic nitrate reduction: molecular properties and functional distinction among bacterial nitrate reductases. Journal of bacteriology, 181(21): 6573-6584.
  10. Strohm TO, Griffin B, Zumft WG, Schink B. (2007). Growth yields in bacterial denitrification and nitrate ammonification. Applied and environmental microbiology, 73(5): 1420-1424.
  11. Ward BB. 2008. Nitrification, p 2511-2518. In Jorgensen SE and Fath BD(ed), Ecological Processes, vol. 3 of Encyclopedia of Ecology, 5 vols. Oxford: Elsevier.
  12. Nakajima M, Hayamize T, Nishimura H. 1983. Effect of oxygen concentration on the rates of denitrification and denitrification in the sediments of an eutrophic lake. Water Resources, vol. 18, no. 3, p 335-337.
  13. Morris JJ, Lenski RE, Zinser ER. 2012. The black queen hypothesis: Evolution of dependencies through adaptive gene loss. mBio 3(2). DOI: 10.1128/mBio.00036-12.
  14. Yu Z, Yang J, Liu L. 2014. Denitrifier Community in the Oxygen Minimum Zone of a Subtropical Deep Reservoir. PLoS One 9(3): e92055.
  15. Palmer K, Horn MA. 2015. Denitrification Activity of a Remarkably Diverse Fen Denitrifier Community in Finnish Lapland Is N-Oxide Limited. PLoS One 10(4): e0123123.